34 research outputs found
Verification of Query Completeness over Processes [Extended Version]
Data completeness is an essential aspect of data quality, and has in turn a
huge impact on the effective management of companies. For example, statistics
are computed and audits are conducted in companies by implicitly placing the
strong assumption that the analysed data are complete. In this work, we are
interested in studying the problem of completeness of data produced by business
processes, to the aim of automatically assessing whether a given database query
can be answered with complete information in a certain state of the process. We
formalize so-called quality-aware processes that create data in the real world
and store it in the company's information system possibly at a later point.Comment: Extended version of a paper that was submitted to BPM 201
Mapping and Cleaning Open Commonsense Knowledge Bases with Generative Translation
Structured knowledge bases (KBs) are the backbone of many
know\-ledge-intensive applications, and their automated construction has
received considerable attention. In particular, open information extraction
(OpenIE) is often used to induce structure from a text. However, although it
allows high recall, the extracted knowledge tends to inherit noise from the
sources and the OpenIE algorithm. Besides, OpenIE tuples contain an open-ended,
non-canonicalized set of relations, making the extracted knowledge's downstream
exploitation harder. In this paper, we study the problem of mapping an open KB
into the fixed schema of an existing KB, specifically for the case of
commonsense knowledge. We propose approaching the problem by generative
translation, i.e., by training a language model to generate fixed-schema
assertions from open ones. Experiments show that this approach occupies a sweet
spot between traditional manual, rule-based, or classification-based
canonicalization and purely generative KB construction like COMET. Moreover, it
produces higher mapping accuracy than the former while avoiding the
association-based noise of the latter
Cardinal Virtues: Extracting Relation Cardinalities from Text
Information extraction (IE) from text has largely focused on relations
between individual entities, such as who has won which award. However, some
facts are never fully mentioned, and no IE method has perfect recall. Thus, it
is beneficial to also tap contents about the cardinalities of these relations,
for example, how many awards someone has won. We introduce this novel problem
of extracting cardinalities and discusses the specific challenges that set it
apart from standard IE. We present a distant supervision method using
conditional random fields. A preliminary evaluation results in precision
between 3% and 55%, depending on the difficulty of relations.Comment: 5 pages, ACL 2017 (short paper
How Stable is Knowledge Base Knowledge?
Knowledge Bases (KBs) provide structured representation of the real-world in
the form of extensive collections of facts about real-world entities, their
properties and relationships. They are ubiquitous in large-scale intelligent
systems that exploit structured information such as in tasks like structured
search, question answering and reasoning, and hence their data quality becomes
paramount. The inevitability of change in the real-world, brings us to a
central property of KBs -- they are highly dynamic in that the information they
contain are constantly subject to change. In other words, KBs are unstable.
In this paper, we investigate the notion of KB stability, specifically, the
problem of KBs changing due to real-world change. Some entity-property-pairs do
not undergo change in reality anymore (e.g., Einstein-children or
Tesla-founders), while others might well change in the future (e.g.,
Tesla-board member or Ronaldo-occupation as of 2022). This notion of real-world
grounded change is different from other changes that affect the data only,
notably correction and delayed insertion, which have received attention in data
cleaning, vandalism detection, and completeness estimation already. To analyze
KB stability, we proceed in three steps. (1) We present heuristics to delineate
changes due to world evolution from delayed completions and corrections, and
use these to study the real-world evolution behaviour of diverse Wikidata
domains, finding a high skew in terms of properties. (2) We evaluate heuristics
to identify entities and properties likely to not change due to real-world
change, and filter inherently stable entities and properties. (3) We evaluate
the possibility of predicting stability post-hoc, specifically predicting
change in a property of an entity, finding that this is possible with up to 83%
F1 score, on a balanced binary stability prediction task.Comment: Incomplete draft. 12 page
Extracting Multi-valued Relations from Language Models
The widespread usage of latent language representations via pre-trained
language models (LMs) suggests that they are a promising source of structured
knowledge. However, existing methods focus only on a single object per
subject-relation pair, even though often multiple objects are correct. To
overcome this limitation, we analyze these representations for their potential
to yield materialized multi-object relational knowledge. We formulate the
problem as a rank-then-select task. For ranking candidate objects, we evaluate
existing prompting techniques and propose new ones incorporating domain
knowledge. Among the selection methods, we find that choosing objects with a
likelihood above a learned relation-specific threshold gives a 49.5% F1 score.
Our results highlight the difficulty of employing LMs for the multi-valued
slot-filling task and pave the way for further research on extracting
relational knowledge from latent language representations.Comment: Accepted to Repl4NLP Workshop at ACL 202